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mini-FAO 

O: Why should one Learn assembLy Language these days? 

A: Unless you are an OS 18 developer, you probabLy don’t need to code in assembly-modern compilers are much better at 
performing optimizations than humans 19 . ALso, modern CPU 20 s are very compLex devices and assembLy knowledge doesn’t 
really help one to understand their internals. That being said, there are at Least two areas where a good understanding of 
assembLy can be heLpful: First and foremost, security/maLware research. It is aLso a good way to gain a better understanding 
of your compiled code whilst debugging. This book is therefore intended for those who want to understand assembLy 
Language rather than to code in it, which is why there are many examples of compiler output contained within. 

O: I clicked on a hyperlink inside a PDF-document, how do I go back? 

A: In Adobe Acrobat Reader click ALt+LeftArrow. 

O: Your book is huge! Is there anything shorter? 

A: There is shortened, Lite version found here: http : //beginners . re/#lite. 

O: I’m not sure if I should try to Learn reverse engineering or not. 

A: Perhaps, the average time to become familiar with the contents of the shortened LITE-version is 1-2 month(s). 

17 twitter.com/TanelPoder/status/524668104065 159169 
18 Operating System 

19 A very good text about this topic: [Fogl3b] 

20 Central processing unit 
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A: Send it to me by email (dennis(a)yurichev.com). 
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tverything is comprehended in comparison 



Author unknown 

When the author of this book first started learning C and, Later, C++, he used to write small pieces of code, compile them, and 
then Look at the assembLy Language output. This made it very easy for him to understand what was going on in the code that 
he had written. 22 . He did it so many times that the relationship between the C/C++ code and what the compiler produced 
was imprinted deepLy in his mind. It’s easy to imagine instantLy a rough outline of C code’s appearance and function. Perhaps 
this technigue could be helpfuL for others. 

Sometimes ancient compilers are used here, in order to get the shortest (or simplest) possible code snippet. 



Exercises 

When the author of this book studied assembly Language, he also often compiled small C-functions and then rewrote them 
gradually to assembly, trying to make their code as short as possible. This probably is not worth doing in real-worLd 
scenarios today, because it’s hard to compete with modern compilers in terms of efficiency. It is, however, a very good way 
to gain a better understanding of assembLy. 

Feel free, therefore, to take any assembLy code from this book and try to make it shorter. However, don’t forget to test what 
you have written. 



Optimization Levels and debug information 

Source code can be compiLed by different compilers with various optimization LeveLs. A typica L compiLer has about three such 
leveLs, where level zero means disable optimization. Optimization can also be targeted towards code size or code speed. 

A non-optimizing compiLer is faster and produces more understandable (aLbeit verbose) code, whereas an optimizing compiler 
is sLower and tries to produce code that runs faster (but is not necessarily more compact). 

In addition to optimization Levels and direction, a compiler can include in the resulting file some debug information, thus 
producing code for easy debugging. 

One of the important features of the 'debug’ code is that it might contain Links between each Line of the source code and the 
respective machine code addresses. Optimizing compilers, on the other hand, tend to produce output where entire Lines 
of source code can be optimized away and thus not even be present in the resulting machine code. 

Reverse engineers can encounter either version, simply because some deveLopers turn on the compiler’s optimization flags 
and others do not. Because of this, we’ll try to work on examples of both debug and release versions of the code featured in 
this book, where possible. 



22 ln fact, he still does it when he can’t understand what a particular bit of code does. 
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Chapter 1 

A short introduction to the CPU 



The CPU is the device that executes the machine code a program consists of. 

A short glossary: 

Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with 
memory, primitive arithmetic operations . As a ruLe, each CPU has its own instruction set architecture (ISA). 

Machine code : Code that the CPU directly processes. Each instruction is usuaLLy encoded by several bytes. 

Assembly Language : Mnemonic code and some extensions Like macros that are intended to make a programmer’s life 

easier. 

CPU register : Each CPU has a fixed set of generaL purpose registers (GPR). » 8 in x86, ~ 16 in x86-64, » 16 in ARM. The 
easiest way to understand a register is to think of it as an untyped temporary variable . Imagine if you were working 
with a high-level PL 1 and could onLy use eight 32-bit (or 64-bit) variables . Yet a lot can be done using just these! 

One might wonder why there needs to be a difference between machine code and a PL. The answer Lies in the fact that 
humans and CPUs are not alike- . It is much easier for humans to use a high-Level PL Like C/C++, Java, Python, etc., but it is 
easier for a CPU to use a much lower level of abstraction . Perhaps it would be possible to invent a CPU that can execute 
high-Level PL code, but it would be many times more complex than the CPUs we know of today . In a similar fashion, it 
is very inconvenient for humans to write in assembly Language, due to it being so low-level and difficult to write in without 
making a huge number of annoying mistakes. The program that converts the high-Level PL code into assembLy is called a 
compiler. 



1.1 A couple of words about different ISAs 

The x86 ISA has always been one with variabLe-length opcodes, so when the 64-bit era came, the x64 extensions did not 
impact the ISA very significantly. In fact, the x86 ISA still contains a Lot of instructions that first appeared in 16-bit 8086 
CPU, yet are still found in the CPUs of today. 



ARM is a RISC 2 CPU designed with constant-length opcode in mind, which had some advantages in the past. In the very 
beginning, all ARM instructions were encoded in 4 bytes 3 . This is now referred to as “ARM mode”. 

Then they thought it wasn’t as frugal as they first imagined. In fact, most used CPU instructions 4 in real worLd applications 
can be encoded using less information. They therefore added another ISA, caLLed Thumb, where each instruction was 
encoded in just 2 bytes. This is now referred as “Thumb mode”. However, not all ARM instructions can be encoded in just 2 
bytes, so the Thumb instruction set is somewhat limited. It is worth noting that code compiled for ARM mode and Thumb 
mode may of course coexist within one single program. 

The ARM creators thought Thumb could be extended, giving rise to Thumb-2, which appeared in ARMv7. Thumb-2 still uses 
2-byte instructions, but has some new instructions which have the size of 4 bytes. There is a common misconception that 
Thumb-2 is a mix of ARM and Thumb. This is incorrect. Rather, Thumb-2 was extended to fully support all processor features 
so it couLd compete with ARM mode-a goaL that was clearLy achieved, as the majority of applications for iPod/iPhone/iPad 
are compiled for the Thumb-2 instruction set (admittedly, LargeLy due to the fact that Xcode does this by default). Later 
the 64-bit ARM came out. This ISA has 4-byte opcodes, and Lacked the need of any additional Thumb mode. However, 

Programming Language 

2 Reduced instruction set computing 

3 By the way, fixed-length instructions are handy because one can calculate the next (or previous) instruction address without effort. This feature will be 
discussed in the switchO operator ( 13.2.2 on page 164) section. 

4 These are MOV/PUSH/CALL/Jcc 
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the 64-bit requirements affected the ISA, resulting in us now having three ARM instruction sets: ARM mode, Thumb mode 
(including Thumb-2) and ARM64. These ISAs intersect partially, but it can be said that they are different ISAs, rather than 
variations of the same one. Therefore, we would try to add fragments of code in all three ARM ISAs in this book. 

There are, by the way, many other RISC ISAs with fixed length 32-bit opcodes, such as MIPS, PowerPC and Alpha AXP. 
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Chapter 2 

The simplest Function 



The simplest possible function is arguabLy one that simpLy returns a constant value: 
Here it is: 



Listing 2.1: C/C++ Code 

int f() 

{ 

return 123; 

}; 



Lets compiLe it! 



2.1 x86 

Here’s what both the optimizing GCC and MSVC compilers produce on the x86 pLatform: 

Listing 2.2: Optimizing GCC/MSVC (assembLy output) 
f : 

mov eax, 123 
ret 



There are just two instructions: the first places the value 123 into the EAX register, which is used by convention for storing 
the return value and the second one is RET, which returns execution to the caller. The caller will take the result from the 
EAX register. 



2.2 ARM 

There are a few differences on the ARM platform: 



Listing 2.3: Optimizing KeiL 6/2013 (ARM mode) ASM Output 



f PROC 

MOV 


r0,#0x7b 


123 


BX 


lr 




ENDP 







ARM uses the register R0 for returning the results of functions, so 123 is copied into R0. 

The return address is not saved on the Local stack in the ARM ISA, but rather in the Link register, so the BX LR instruction 
causes execution to jump to that address-effectiveLy returning execution to the caller. 

It is worth noting that MOV is a misleading name for the instruction in both x86 and ARM ISAs. The data is not in fact 
moved, but copied. 
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CHAPTER 2. THE SIMPLEST FUNCTION 

2.3 MIPS 

There are two naming conventions used in the worLd of MIPS when naming registers: by number (from $0 to $31) or by 
pseudoname ($V0, $A0, etc). The GCC assembly output below lists registers by number: 



Listing 2.4: Optimizing GCC 4.4.5 (assembLy output) 



j 


$31 




li 


$2,123 


# 0x7b 



...while IDA 1 does it-by their pseudonames: 

Listing 2.5: Optimizing GCC 4.4.5 (IDA) 

j r $ra 

li $v0, 0x7B 



The $2 (or $V0) register is used to store the function’s return value. LI stands for “Load Immediate” and is the MIPS eguivaLent 
to MOV. 

The other instruction is the jump instruction (J or JR) which returns the execution flow to the caLLer, jumping to the address 
in the $31 (or $RA) register. This is the register analogous to LR 2 in ARM. 

You might be wondering why positions of the the load instruction (LI) and the jump instruction (J or JR) are swapped. This is 
due to a RISC feature caLled “branch deLay slot”. The reason this happens is a guirk in the architecture of some RISC ISAs 
and isn’t important for our purposes - we just need to remember that in MIPS, the instruction foLLowing a jump or branch 
instruction is executed before the jump/brunch instruction itself. As a conseguence, branch instructions aLways swap pLaces 
with the instruction which must be executed beforehand. 



2.3.1 A note about MIPS instruction/register names 

Register and instruction names in the worLd of MIPS are traditionally written in Lowercase. However, for the sake of consis- 
tency, we’LL stick to using uppercase letters, as it is the convention foLLowed by alL other ISAs featured this book. 



1 Interactive Disassembler and debugger developed by Hex-Rays 

2 Link Register 
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Chapter 3 

Hello, world! 



Let’s use the famous example from the book “The C programming Language”[Ker88]: 
#include <stdio.h> 

int main() 

{ 

printf ("hello, world\n"); 
return 0; 

} 



3.1 x86 

3.1.1 MSVC 

Let’s compile it in MSVC 2010: 
cl 1 .cpp /Fal .asm 



(/Fa option instructs the compiler to generate assembly listing file) 

Listing 3.1: MSVC 2010 



CONST 


SEGMENT 




$SG3830 


DB 


' hello , world ' , 


CONST 


ENDS 




PUBLIC 


_main 




EXTRN 


_printf 


PR0C 


; Function compile flags: /Odtp 


_TEXT 


SEGMENT 




_main 


PR0C 






push 


ebp 




mov 


ebp, esp 




push 


OFFSET $SG3830 




call 


_printf 




add 


esp, 4 




xor 


eax, eax 




pop 


ebp 




ret 


0 


_main 


ENDP 




TEXT 


ENDS 





MSVC produces assembly Listings in Intel-syntax. The difference between InteL-syntax and AT&T-syntax will be discussed 
in 3.1.3 on page 9. 

The compiler generated the file, 1 .obj, which is to be Linked into 1 .exe. In our case, the file contains two segments: 
CONST (for data constants) and _TEXT (for code). 

The string hello, world in C/C++ has type const char [] [Strl3, pl76, 7.3.2], but it does not have its own name. The 
compiler needs to deal with the string somehow so it defines the internal name $SG3830 for it. 

That is why the example may be rewritten as foLLows: 
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